PRIS at Chinese Language Processing

نویسندگان

  • Jiayue Zhang
  • Yichao Cai
  • Si Li
  • Weiran Xu
  • Jun Guo
چکیده

The more Chinese language materials come out, the more we have to focus on the “same personal name” problem. In our personal name disambiguation system, the hierarchical agglomerative clustering is applied, and named entity is used as feature for document similarity calculation. We propose a two-stage strategy in which the first stage involves word segmentation and named entity recognition (NER) for feature extraction, and the second stage focuses on clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao

As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...

متن کامل

Determination of 5-Hydroxymethyl-2-furaldehyde of Crude and Processed Fructus Corni in Freely Moving Rats Using In Vivo Microdialysis Sampling and Liquid Chromatography

      The purpose of this study was to develop a sensitive and fast microdialysis coupled with high-performance liquid chromatographic (HPLC) method for determination of 5-hydroxymethyl-2-furaldehyde (5-HMF) in free-moving rats after i.g. administration of the aqueous extract of crud Fructus corni and its processed products of jiuzheng pin (JZP). The concentration of 5-HMF in free-movi...

متن کامل

What You Need to Know about Chinese for Chinese Language Processing

The synergy between language sciences and language technology has been an elusive one for the computational linguistics community, especially when dealing with a language other than English. The reasons are two-fold: the lack of an accessible comprehensive and robust account of a specific language so as to allow strategic linking between a processing task to linguistic devices, and the lack of ...

متن کامل

Research on Chinese discourse rhetorical structure representation scheme and corpus annotation

It is well-known that interpretation of a text requires understanding of its rhetorical relation hierarchy since discourse units rarely exist in isolation. Such discourse structure is fundamental to document-level applications, such as text understanding, summarization, knowledge extraction and question-answering. In comparison with English, there are only a few studies on Chinese discourse ana...

متن کامل

HMM and CRF Based Hybrid Model for Chinese Lexical Analysis

This paper presents the Chinese lexical analysis systems developed by Natural Language Processing Laboratory at Dalian University of Technology, which were evaluated in the 4th International Chinese Language Processing Bakeoff. The HMM and CRF hybrid model, which combines character-based model with word-based model in a directed graph, is adopted in system developing. Both the closed and open t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010